Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration

نویسندگان

  • Michael Wunder
  • Michael L. Littman
  • Monica Babes-Vroman
چکیده

Q-learning in single-agent environments is known to converge in the limit given sufficient exploration. The same algorithm has been applied, with some success, in multiagent environments, where traditional analysis techniques break down. Using established dynamical systems methods, we derive and study an idealization of Q-learning in 2-player 2-action repeated general-sum games. In particular, we address the discontinuous case of -greedy exploration and use it as a proxy for value-based algorithms to highlight a contrast with existing results in policy search. Analogously to previous results for gradient ascent algorithms, we provide a complete catalog of the convergence behavior of the -greedy Q-learning algorithm by introducing new subclasses of these games. We identify two subclasses of Prisoner’s Dilemma-like games where the application of Q-learning with -greedy exploration results in higher-than-Nash average payoffs for some initial conditions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modelling the dynamics of multiagent Q-learning with ε-greedy exploration

We present a framework to model the dynamics of Multiagent Q-learning with -greedy exploration. The applicability of the framework is tested through experiments in typical games selected from the literature.

متن کامل

Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models

Achieving efficient and scalable exploration in complex domains poses a major challenge in reinforcement learning. While Bayesian and PAC-MDP approaches to the exploration problem offer strong formal guarantees, they are often impractical in higher dimensions due to their reliance on enumerating the state-action space. Hence, exploration in complex domains is often performed with simple epsilon...

متن کامل

Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax

This paper proposes “Value-Difference Based Exploration combined with Softmax action selection” (VDBE-Softmax) as an adaptive exploration/exploitation policy for temporal-difference learning. The advantage of the proposed approach is that exploration actions are only selected in situations when the knowledge about the environment is uncertain, which is indicated by fluctuating values during lea...

متن کامل

The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems

Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate their action choices in multiagent systems. We examine some of the factors that can influence the dynamics of the learning process in such a setting. We first distinguish reinforcement learners that are unaware of (or ignore) the presence of other agents from those that explicitly attempt to lear...

متن کامل

Ipseity: An Open-Source Platform for Synthesizing and Validating Artificial Cognitive Systems in MAS (Demonstration)

This article presents an overview of Ipseity, an open-source platform developed in C++ with the Qt framework. The current version of the platform includes a set of plugins implementing single-agent and multi-agent environments, hardcoded controllers based on Artificial Intelligence (AI) techniques, classical Reinforcement Learning (RL) techniques like Q-Learning, Sarsa, Epsilon-Greedy combined ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010